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Abstract — In  this  paper,  we  consider  the  evaluation  of  informa¬ 
tion  divergence  and  information  gain  as  they  apply  to  a  hybrid 
random  variable  (i.e.  a  random  variable  which  has  both  discrete 
and  continuous  elements)  for  multi-target  tracking  problems.  In 
particular,  we  develop  a  closed-form  solution  for  the  Cauchy- 
Schwarz  information  divergence  under  the  assumption  that  the 
continuous  element  of  the  random  variable  may  be  represented 
by  a  Gaussian  mixture  distribution  and  present  the  associated 
relationships  for  evaluating  the  Cauchy-Schwarz  information 
gain.  The  developed  information  gain  relationships  are  applied  to 
a  0-1  target  tracking  problem  common  to  space  object  tracking 
to  determine  the  sensitivities  to  the  information  gain  due  to 
probability  of  detection,  prior  probability  of  object  existence, 
and  measurement  noise. 

I.  Introduction 

One  of  the  core  concerns  in  Space  Situational  Awareness 
(SSA)  is  the  maintenance  of  a  catalog  of  tracked  objects.  Since 
the  first  launch  of  artificial  satellites,  the  number  of  objects  in 
orbit  coming  from  new  launches,  decommissioned  satellites, 
and  debris  created  by  collision  of  objects  in  orbit  has  posed  an 
ever  increasing  challenge  to  the  development  of  space  object 
catalogs.  As  of  2006,  there  were  approximately  9000  space 
objects  being  tracked  by  the  U.S.  Space  Surveillance  Network 
and  maintained  in  the  satellite  catalog  [1].  Currently,  there  are 
approximately  20,000  space  objects  currently  being  tracked, 
with  1000  of  those  objects  being  active  objects.  Furthermore, 
it  is  estimated  that  500,000  objects  with  a  diameter  larger  than 
one  centimeter  are  in  orbit.  These  numbers  will  inevitably 
increase  as  more  objects  are  launched  and  as  more  collisions 
occur.  The  current  number  of  objects  coupled  with  the  rapid 
advances  in  sensor  technology  that  enable  the  detection  of 
larger  numbers  of  objects  leads  to  a  need  for  advanced  strate¬ 
gies  for  scheduling  sensors  so  as  to  optimally  utilize  available 
resources  while  maintaining  accurate  catalogs  of  space  objects. 

The  current  measure  of  performance  for  tasking  is  based 
upon  maximizing  the  number  of  observations  per  prioritized 
objects.  Given  the  scarcity  of  sensing  resources,  this  metric 
will  fail  to  consistently  acquire  objects  for  a  growing  number 
of  detections.  Mitigating  this  situation  requires  a  method  for 
dynamically  assigning  which  targets  are  to  be  tracked  and 
when  they  are  to  be  tracked  by  a  subset  of  the  available  sen¬ 
sors.  The  process  of  dynamic  sensor  tasking  typically  employs 
some  measure  of  the  information  content  of  each  available 


sensor-object  pair  in  order  to  formulate  an  optimization  prob¬ 
lem  which  schedules  the  sensors  in  a  manner  that  maximizes 
the  information  gained  regarding  any  individual  object.  In 
these  problems,  the  actual  measurements  may  be  providing 
different  types  and  qualities  of  data  (e.g.  line-of-sight  data  or 
range  data).  Additionally,  since  the  sensors  in  a  given  network 
are  neither  identical  or  collocated  their  object  information 
content  is  dependent  on  the  dynamic  environment,  the  sensor’s 
location,  orientation,  and  inherent  accuracy.  Therefore,  the 
amount  of  information  that  can  be  gained  on  an  object  is  not 
only  a  function  of  the  target,  but  also  of  the  sensor,  and  of  the 
overall  problem  geometry. 

Previous  studies  have  examined  the  utilization  of  myopic 
algorithms  for  dynamic  sensor  tasking.  For  example,  Erwin, 
et  al.  detailed  the  implementation  of  Fisher  information  as 
a  measure  of  the  information  content  in  orbit  determination 
problems  [2].  Subsequently,  Williams,  et  al.  extended  this 
approach  to  incorporate  the  utilization  of  the  largest  Lyapunov 
exponent  in  orbit  determination  problems  [3].  Kreucher,  et 
al.  examined  the  general  problem  of  information  based  sensor 
management  from  an  information-theoretic  perspective  utiliz¬ 
ing  the  Kullback-Leibler  and  Renyi  divergences  to  formulate 
measures  of  information  gain  [4].  Extending  the  work  of 
Kreucher,  et  al.,  but  in  the  context  of  sensor  scheduling  for 
antisubmarine  warfare,  Aughenbaugh  and  La  Cour  utilized 
information-theoretic  information  gain  relationships  for  the 
Kullback-Leibler  and  Renyi  divergences  to  assess  the  perfor¬ 
mance  of  myopic  sensor  scheduling  problems  [5].  Extending 
the  work  of  Aughenbaugh  and  La  Cour,  DeMars  and  Jah 
developed  and  investigated  the  utilization  of  information  gain 
measures  for  several  class  of  information-theoretic  divergences 
for  the  problem  of  sensor  tasking  in  uncertain  orbital  dynam¬ 
ical  systems  [6] 

It  is  important  to  note  that  the  utilization  of  sensor  time  is 
both  scarce  and  expensive.  Decisions  on  whether  to  operate 
a  sensor  in  a  mode  which  optimizes  tracking  capabilities 
versus  a  mode  which  optimizes  detection  capabilities  require 
an  assessment  of  how  much  information  can  be  extracted 
(or  gained).  This  expected  information  gain  will  have  two 
(interdependent)  components:  one  that  is  continuous  in  nature 
and  another  that  is  discrete  in  nature.  Therefore,  the  expected 
information  gain  associated  with  each  sensor  assignment  is 
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of  a  hybrid  nature.  The  final  requirement,  therefore,  for  an 
effective  approach  to  solving  the  SSA  problem  must  provide 
rigorous  machinery  for  quantifying  hybrid  information  gain 
for  optimal  sensor  allocation. 

The  goal  of  this  paper  is  to  investigate  the  Cauchy-Schwarz 
information  divergence  and  its  associated  information  gain 
within  the  multi-target  tracking  framework  of  Finite  Set 
Statistics  (F1SST)  [7],  [8].  Specifically,  we  develop  a  closed- 
form  solution  for  the  Cauchy-Schwarz  information  divergence, 
present  a  method  for  determining  the  associated  information 
gain,  and  apply  the  developments  to  a  multi-target  tracking 
problem. 

The  paper  is  organized  as  follows:  the  problem  statement 
and  relevant  notation  is  provided  in  Section  II,  the  basic 
formulation  of  the  Cauchy-Schwarz  divergence  for  multi-target 
problems  and  a  closed-form  solution  are  given  in  Section  III, 
a  discussion  of  the  Cauchy-Schwarz  information  gain  is  given 
in  Section  IV,  some  results  are  presented  in  Section  V,  and 
we  conclude  with  some  remarks  in  Section  VI. 

II.  Problem  Statement 

As  opposed  to  purely  discrete  or  purely  continuous  Bayesian 
inference,  F1SST  makes  use  of  set-valued  random  variables. 
An  example  of  a  set-valued  random  variable  is  the  state 
X  =  x}  in  an  SSA  characterization  and  tracking  in¬ 
ference  problem.  If  we  let  W  be  the  set  of  all  possible 
object  types,  then  Xd  £  Wis  the  discrete  component  of  the 
state  that  describes  a  space  object’s  type  (and,  hence,  its 
dynamic  model)  and  x  £  Rs  is  the  continuous  component 
of  the  s-dimensional  state  (e.g.  the  position  and  velocity 
of  an  object).  In  detection  and  tracking,  the  system  state 
X  =  (n,X),  where  n  is  the  discrete  component  of  the  state 
that  describes  the  number  of  objects  in  the  search  space  and 
XT  =  \x\  X2  ■  ■  ■  x^]  £  Rsra  describes  the  positions  and 
velocities  of  these  objects.  Notice  here  the  explicit  dependence 
of  the  dimension  of  the  continuous  state  space  Rsn  on  the 
discrete  component  n  of  the  state.  For  brevity,  we  simply  write 
X  =  {X!,X2,  ■  .-xn}. 

Bayes’  law  for  performing  a  measurement  update  step  takes 
on  exactly  the  same  form  in  the  hybrid  FISST  approach  as  it 
does  in  purely  continuous  and  or  purely  discrete  problems, 
that  is 

fk+Mk+1(X\Z^)  oc  fk+1(Zk+1\X)fk+llk(X\Z^)  (1) 

where  fk+i(Z\X)  is  the  multi-target  likelihood  function  that 
describes  the  likelihood  of  getting  a  measurement  Zk+  \  given 
the  state  Xk+1,  and  Z ^  :  Z\, , . .  ,Zk  is  the  time  sequence 
of  measurement  sets  up  to  time  k.  If  desired,  Eq.  (1)  can  be 
changed  to  an  equality  by  dividing  the  right-hand  side  by  the 
Bayes’  factor,  which  is  given  by 

fk+1(Zk+1\Z^)  =  J  fk+1(Zk+1\X)fk+1{k(X\Z^)6X 

(2) 

Notice  that  the  integrals  are  set  integrals.  For  multi-target 
detection  and  tracking,  a  set  integral  of  a  scalar-valued  set 


function  g(X)  is  defined  to  be  the  integral  of  g  over  the 
continuous  component,  summed  over  all  possible  discrete 
values  [7],  [8] 

J  g(X)SX  =  g(X  =  0)  (3) 

00  1  r 

+  ^2~  g({x  1, . . . ,  xn})dx  1 . . .  dxn 

n—  1 

where  the  factorial  coefficient  is  to  take  into  account  all  the 
different  possible  orderings  of  X  as  evaluated  in  the  function 
9- 

In  order  to  develop  measures  of  the  information  gain  avail¬ 
able  by  scheduling  measurements,  we  first  consider  measures 
of  the  directed  difference  between  two  generalized  pdfs, 
namely  the  a  priori  and  a  posteriori  pdfs,  i.e.  the  generalized 
pdfs  immediately  before  and  after  processing  measurement 
data,  respectively.  Generally  speaking,  the  information  diver¬ 
gence  is  a  measure  of  distance  (i.e.  similarity  or  dissimilarity) 
between  two  pdfs.  Given  an  information  divergence  describing 
the  directed  distance  fromp(X)  to  q(X)  denoted  by  D[p||g], 
the  “distance”  is  called  a  metric  if  [9] 

1)  D[p||g]  >  0  with  equality  iff  p(X)  =  q(X)  (non¬ 
negativity  and  positive  definiteness), 

2)  £)[p||q']  =  £%||p]  (symmetry),  and 

3)  £)[p||r]  <  Z?[p||g]  +T%||r]  (sub-additivity/triangle  in¬ 
equality). 

Information  divergences  which  only  satisfy  the  first  condition 
are  referred  to  as  asymmetric  divergences,  whereas  satisfaction 
of  the  second  condition  necessarily  removes  the  restriction  of 
referring  to  the  divergence  as  asymmetric.  However,  in  this 
work,  asymmetric  divergences  will  be  referred  to  as  diver¬ 
gences  with  the  understanding  that  symmetry  is  not  required 
for  the  results  to  hold.  One  of  the  most  common  information 
divergences  is  the  Kullback-Leibler  divergence,  given  by  [10] 

DKL\p\\q}=  Jp(X)\og^SX  (4) 

which  was  investigated  for  multi-target  tracking  by  Uney  et 
al.  [11].  The  Kullback-Leibler  divergence,  however,  only  ad¬ 
mits  closed-form  solutions  in  special  cases,  such  as  for  linear 
Gaussian  systems.  Motivated  by  this  fact,  we  consider  the 
Cauchy-Schwarz  divergence  which  has  a  closed-form  solution 
for  single  target  tracking  frameworks  [6]. 


111.  Cauchy-Schwarz  Information  Divergence 


By  defining  a  the  inner-product  of  two  square-integrable 
functions  p(X)  and  q(X)  as  (p,q)  =  f  p(X)q(X)SX,  the 
Cauchy-Schwarz  inequality  may  be  used  to  define  the  Cauchy- 
Schwarz  information  divergence  as  [12] 


Dcs[p\\q] 


^log 


(fp2(X)SX)  ( Jq2(X)SX ) 


(fp(X)q(X)SXy 
\  log  J  p2(X)SX  +  i  logy  q2(X)SX  (5) 

-  log  J  p(X)q(X)SX 
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where,  for  the  purposes  of  this  work,  q(X)  represents  the 
multi-target  prior  generalized  pdf  and  p(X)  represents  the 
multi-target  posterior  generalized  pdf.  The  Cauchy-Schwarz 
divergence  may  also  be  generalized  to  a  class  of  informa¬ 
tion  divergences  via  introduction  of  a  control  parameter;  this 
class  of  information  divergences  is  known  as  the  Gamma 
divergence,  for  which  the  Cauchy-Schwarz  divergence  is  a 
special  case  [9].  Additionally,  the  Cauchy-Schwarz  divergence 
is  implicitly  related  to  the  quadratic  entropy  of  Renyi,  which, 
for  pdf  r(X)  is  given  by  [13] 

H$=-  log  J  r2(X)6X 

which  is  of  the  same  form  as  the  first  two  quantities  in  Eq.  (5). 
The  Cauchy-Schwarz  does  not  satisfy  the  triangle  inequality 
and  therefore  cannot  be  classified  as  a  metric,  but  it  does 
satisfy  the  following  properties  [14]: 

1)  DCs[p\\q]  >  o  Wp,q 

2)  Dcs[p\\q]  =  o  ifff(X)  =  g(X) 

3)  Dcs[p\\q}  =  Dcs[q\\p\ 

4)  A?s[p|M  is  additive  for  independent  events 

5)  DCs[c  p\\q }  =  DCs[p\\q\  for  any  c  >  0 

These  properties  illustrate  that  the  Cauchy-Schwarz  divergence 
is  positive  semi-definite,  symmetric,  and  scale-invariant.  This 
last  property  is  a  very  nice  feature  of  the  Cauchy-Schwarz 
divergence  which  we  will  make  use  of  in  the  sequel. 

For  the  sake  of  brevity  and  ease  of  notation,  we  restrict 
our  attention  to  a  case  in  which  there  can  exist  at  most  one 
object  and  at  most  one  clutter  point  in  the  search  space, 
which  we  refer  to  as  the  “0-1  problem”.  A  summary  of  the 
pertinent  FISST  equations  is  provided  in  the  Appendix,  and  a 
full  treatment  of  the  development  of  the  FISST  equations  for 
the  0-1  problem  is  given  in  Hussein,  et  al.  [15]. 

Since  we  are  considering  the  0-1  problem,  we  need  only  to 
account  for  the  possibilities  that  there  is  no  target,  i.e.  X  =  0, 
and  that  there  is  a  single  target,  i.e.  X  =  {a;}.  Then,  by  Eq.  (3), 
the  integral  terms  of  Eq.  (5)  may  be  written  as 

J  p2(X)SX  =  p2(0)  +  J  p2(x)dx 


which  allows  the  Cauchy-Schwarz  divergence  for  the  0-1 
problem  to  be  expressed  as 


Dcs  =  \  log [/fc2+1,fc+1(X  =  0|Z(fe+1)) 

+  J  fk+i\k+1(X  =  {x}\Z^)da 

+  ^og[/fc2+1|fc(X  =  0|Z«) 

.  /  i*2  (v  f~'nr7(k)\J„ 


logk+1|fc+1(X  =  0|4fc+1)) 


X  fk+i\k(X  =  d)\Z^) 

+  J  /fc+i|fc+i(*  =  MI^(fc+1)) 


X  /fc+i|fc(*  =  Ml zW)dx 


(6) 


Note  that  in  Eq.  (6)  we  have  dropped  the  functional  depen¬ 
dence  of  the  Cauchy-Schwarz  divergence  on  the  pdfs  for  which 
the  divergence  is  computed  as  it  is  no  longer  ambiguous 
which  pdfs  are  the  inputs.  Recalling  the  scale-invariance 
property  of  the  Cauchy-Schwarz  diverence  and  substituting  for 
/fc+i|fc+i(2f|.Z(fc+1))  in  Eq.  (6)  from  the  Bayes’  rule  update 
of  Eq.  (1)  yields 


Dcs  =  -  log 


f2k+1(Zk+i\X  =  0)/fc2+1  \k(X  =  0| 

+  J  f£+1(Zk+1\X  =  {x})fl+1\k{X  =  {x}\zW)dx 
+  llog[f2+1]k(X  =  ®\ZW) 

+  J  fk+i\k(x  =  {41  Z{k))dx 

-  log[fk+1(Zk+i\X  =  <D)fk+i\ k(X  =  0|4fc)) 

+  [  fk+1(Zk+1\X  =  {x})f2k+llk(X  =  {x}\zW)dx 


J  q2{X)6X  =  q2(Q)  +  J  q2(x)dx 
J  p(X)q(X)SX  =  p(0)<?(0)  +  J  p[x)q{x)dx 


Now,  we  must  consider  different  measurement  sets  inde¬ 
pendently.  Since  we  have  restricted  our  attention  to  the  0-1 
problem,  three  possible  measurement  sets  are  possible: 


At  this  point,  the  prior  generalized  pdf  is  associated  with  q(X) 
and  the  posterior  generalized  pdf  is  associated  with  p(X),  such 
that 

p(0)  =  fk+i\k+i{X  =  0|i?(fc+1l) 

P(x)  =  fk+i\k+i(X  =  |4|  Z('k+1)) 

9(0)  =  fk+i\k{X  =  0|4fc)) 

9(4  =  fk+i\k(x  =  (41  z{k)) 


1)  no  sensor  return,  in  which  case  Zk+1  =  0 

2)  a  single  sensor  return,  in  which  case  Zk+ 1  =  {z} 

3)  two  sensor  returns,  in  which  case  Zk+ 1  =  {21,22} 

For  each  case,  we  apply  the  0-1  problem  FISST  equations  that 
are  summarized  in  the  Appendix  and  developed  by  Hussein,  et 
al.  [15].  Before  proceeding,  it  is  useful  to  define  some  terms 
which  appear  repeatedly.  Let  p  be  the  prior  probability  that 
the  object  exists  at  time  k,  pn  be  the  probability  of  detection, 
and  pf  be  the  probability  of  false  alarm.  Furthermore,  let  us 
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define  X0,  X2(z),  and  l2(zi,Z2)  as 


*=/ fl+l\A*\Z(k))dx  (7) 

MZ)=J  fk+i{z\x)fl+1]k(x\Z^k))dx  (8) 

X2(zltZ2)  =  J  fk+l{zi\x)fk+l(Z2\x)fk+1^k(x\Z^)dx 

(9) 


For  the  case  of  no  sensor  return  (Zk+i  =  0),  it  can  be  shown 
that 

Dcs{Zk+i  =  0)  =  ilog  (1  -p)2  +p2(l  -pD)2Io  (10) 


■log 


(1  -p)2  +p2l0 


-  log 


(1  -  p)2  +  p2(l  -  pD)I0 


For  the  case  of  a  single  sensor  return  (Zk+ 1  =  {z}),  it  can  be 
shown  that 


Dcs{Zk+ i  =  {z})  = 

7^  log  (1  -p)2p2Fg2{z)  +  p2p2F{l-pD)2g2(z)X0 

+  2p2pF(l  -  pF)pD(l  -  pD)g(z)X1(z) 


(11) 


+  P2{  1  -  pF)2p2DI2(z,  z)  +  i  log  (1  -  p)2  +  p2X0 
\2 


-  log 


(1  -p)  pFg(z)+p  pF{l  ~  pD)g(z)l0 


+  p2(l  P  F)p  DX\  (z) 


where  g(z)  is  the  spatial  likelihood  distribution  function  that 
a  clutter  point  generated  the  measurement  z.  For  the  case  of 
two  sensor  returns  (Zk+ 1  =  {z1,z2}),  it  can  be  shown  that 


Dcs(Zk+i  =  {zi,z2})  = 

\g 2  {zi)X2  (z2  ,z2)  +  g2  ( Z2)I2  (z  1 ,  Zi) 


(12) 


;  log 


2g(z1)g(z2)X2(z1,z2) 


1 


log 


(1  -  p)2  +  p2l0 


-  log  pg(z1)X1{z2)  +  pg(z2)X1(z1) 


where  g{z{)  is  the  spatial  likelihood  distribution  function  that 
a  clutter  point  generated  the  measurement  z\  and  similarly  for 

SO  2). 

Up  to  this  point,  no  explicit  forms  of  the  pdfs  involved  in 
the  computation  of  the  Cauchy-Schwarz  divergence  have  been 
introduced,  rendering  the  preceding  results  completely  general 
outside  of  the  specification  to  the  0-1  problem.  To  obtain 
solutions  which  are  readily  implementable  in  computations, 
however,  it  is  useful  to  specify  forms  of  the  involved  pdfs 
so  as  to  obtain  a  closed-form  solutions  for  Eqs.  (10)— (12). 
Specifically,  this  means  that  the  forms  of  fk+1(z\x)  and 
fk+ilki^Z^)  need  to  be  prescribed  so  that  the  integral  terms 
of  Eqs.  (7)-(9)  may  be  computed  and  utilized  in  Eqs.  (10)- 
(12). 


A.  Closed-Form  Solution  of  the  Cauchy-Schwarz  Divergence 

To  obtain  closed-form  solutions  to  the  Cauchy-Schwarz 
divergence  equations,  we  first  assume  that  the  prior  pdf, 
fk+i\k(x\Z[k)),  and  the  measurement  pdf,  fk+i(z\x),  are 
represented  by  a  Gaussian  mixture  and  by  a  Gaussian,  re¬ 
spectively,  such  that 

L 

fk+i\k(.x\Z(k))  =  'Y^wipg(x-,mi,Pi)  (13) 

2=1 

fk+i(z\x)  =pg(z-,Hx,R)  (14) 

where  pg(y;  a1  A)  is  used  to  denote  a  Gaussian  pdf  with  mean 
a  and  covariance  A.  Before  proceeding  further,  it  is  worth 
noting  two  identities  regarding  multiplying  Gaussian  pdfs.  The 
product  of  two  Gaussian  pdfs  of  the  same  random  variable  is 
given  by  an  unnormalized  Gaussian  pdf  as  [16] 

pg(x;  a,  A)pg(x ;  b,  B)  =  T(a,  6,  A,  B)pg(x ;  c,  C)  (15) 

where 

c=C(A~1a  +  B~1b) 

C  =  {A-1  +  B-1)-1 

T(a,  b,  A,  B)  =  fiiriA  +  B)^1/2 

x  exp  |~2^a  —  +  B)~l(a  —  b ) 

The  second  identity  states  that  for  H,  R,  m,  and  P  of 
matching  dimensions  with  R  and  P  positive  definite  [17] 

pg{z;  Hx,  R)Pg(x ;  m,  P)  =  Q(z ;  H ,  m,  P,  R)pg(x;  y,  S) 

(16) 

where 


p  =  m  +  K(z  —  Hm) 

S  =  P - KHP 

K  =  PHt(HPHt  +  R)-1 
Q{z ;  iT,  m,  P,  R)  =  pg{z;  Hm,  HPHT  +  R) 

To  obtain  closed-form  solutions  to  the  Cauchy-Schwarz 
divergence  of  Eqs.  (10)— (12),  we  only  need  to  find  closed- 
form  solutions  for  the  integral  terms  of  Eqs.  (7)-(9).  We  begin 
by  noting  that  Xq  may  be  written  as 

lo  =  f  fk+i\k(x\Z^)fk+llk(x\Z^)dx 

Then,  substituting  for  fk+ykixlZ^)  from  Eq.  (13)  and 
applying  the  identity  of  Eq.  (15),  it  follows  that  X0  is  given 
by 

L  L 

Xq  ='Y^2lWiWjT(mi,mj,Pi,Pj)  (17) 
*= 1  3= 1 

In  a  similar  approach  to  that  of  computing  X0,  X\  may  be 
alternatively  expressed  as 

Mz)  =  J  fk+i(z\x)fk+llk(x\Z^)fk+llk(x\Z^)dx 
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Substituting  for  fk+i\k(x\Z^)  from  Eq.  (13)  and 
fk+i(z\x)  from  Eq.  (14),  then  applying  the  identities 
of  Eqs.  (15)  and  (16),  we  obtain 

L  L 

*  (*)  =  ££«*  WjQ(z;H,mi,Pi,R)  (18) 

*=  i  j= i 

x  1  (/r , ,  Tiij ,  ) 

where 

Hi  =  rrii  +  Ki(z  -  Hrrii) 

Si  =  Pi  -  XiiJPi 

iA  =  PiHT(HPiHT  +  R r1 
Finally,  Z2  may  be  expressed  as 

22(21,22)  =  J  fk+i(z1\x)fk+Mk(x\Z{k'>) 

x  /fc+i(22|a;)/fe+i|fe(a3|Z(fc))da; 

Once  again,  by  substituting  for  /fc+iifc(a;|Z(fc))  from  Eq.  (13) 
and  A+i(z|a:)  from  Eq.  (14),  then  applying  the  identities  of 
Eqs.  (15)  and  (16),  it  can  be  shown  that 

L  L 

X2  (zi,z2)  =  EEw*  WjQ(z1,H,mi,Pi,R )  (19) 

1=1  j= 1 

x  Q(z2,H,mj,Pj,R) 

X  M2,j ,  ,  Sj ) 

where 

Mi ,i  =  rrii  +  Ki(zi  -  H rn, ) 

H2,i  =  rrii  +  Ki{z2  -  Hrrii) 

Si  =  P,  -  KP/P, 

K,  =  PtHT(HPtHT  +  P)1 

Thus,  a  closed-form  solution  to  the  Cauchy-Schwarz  infor¬ 
mation  divergence  for  the  0-1  problem  has  been  obtained  under 
the  assumptions  that  the  state  pdf  may  be  represented  as  a 
Gaussian  mixture  and  that  the  measurement  pdf  may  be  repre¬ 
sented  as  a  Gaussian.  To  summarize,  Eqs.  (17),  (18),  and  (19) 
are  utilized  to  compute  2o,  T 2(z),  and  X2(zi,  z2),  which  may 
then  be  employed  in  Eqs.  (10)— (12)  to  compute  the  Cauchy- 
Schwarz  information  divergence,  with  the  specific  equation 
employed  being  dependent  upon  whether  there  were  no  sensor 
returns,  a  single  sensor  return,  or  two  sensor  returns. 

IV.  Cauchy-Schwarz  Information  Gain 

The  Cauchy-Schwarz  information  divergence  provides  a 
method  by  which  the  amount  of  acquired  information  regard¬ 
ing  the  state  (both  the  discrete  and  continuous  components) 
may  be  determined  given  measurement  data  (i.e.  no  return,  a 
single  return,  or  two  returns).  It  does  not,  however,  provide  a 
measure  that  can  be  used  to  assess  future  performance,  i.e.  in 
the  case  that  no  data  is  yet  available.  For  this  reason,  we  define 
the  Cauchy-Schwarz  information  gain  to  be  the  expected  value 
of  the  information  divergence  over  all  possible  measurement 


outcomes.  Since  Dcs  '■  X  x  Z  i-»  R+,  it  is  seen  that  by 
Eq.  (3),  the  information  gain  may  be  written  as 

Gcs  =  f(Zk+ 1  =  §)Dcs{Z  =  0) 

+  J  f(Zk+1  =  {z}DCS(Z  =  {z})dz 

+  \  J  f(Zk+ 1  =  {21,  z2})DCs{Z  =  {21,  z2})dz1dz2  , 

where  f{Zk+1  =  0),  f(Zk+1  =  {z},  and  f{Zk+1  =  {zi,z2}) 

are  the  Bayes  factors  for  the  cases  of  no  return,  a  single  return, 
and  two  returns,  respectively.  The  forms  of  the  Bayes  factors 
are  given  in  the  Appendix  and  discussed  in  more  detail  in  Ref¬ 
erence  [15].  Notice  here  that  we  generalize  the  conventional 
definition  of  expectations  to  compute  the  expected  hybrid 
information  divergence.  This  is  a  mathematically  well-defined 
operation  since  the  information  divergence  function  is  a  real¬ 
valued  set-function.  This  generalization  is  mathematically  ill- 
defined  if  the  function  one  is  taking  an  expectation  of  is  set¬ 
valued,  say  in  attempting  to  compute  the  expected  value  of  a 
set-valued  random  variable  X.  For  more  on  this,  see  Chapter 
16  of  Reference  [8]. 

Letting  p0  =  (1  ~Pf)  [(1  ~p)  +p(l  ~Pd)\  ,  Pf  =  PPd{  1- 
Pf),  Pg  =  Pf  [(1  -  p)  +  p{  1  -  Pd)\,  and  pfg  =  ppFpD, 

it  follows  that  the  Cauchy-Schwarz  information  gain  may  be 
expressed  as 

Gcs  =  PvE®  +  pfEf(z)  +  pgEg(z)  (20) 

1  ,  N  1  , 

+  ^PfgEfg{zi,z2)  +  -pfgEfg(z2lz  1) 

where 

Em  =  Dcs(Z  =  0) 

Ef{z)=  J  fk+1(z)DCs{Z  =  {z})dz  (21) 

Eg(z)  =  J  g{z)Dcs(Z  =  {z})dz  (22) 

Efg{z1,z2)=  /  fk+i(zi)g{z2)DCs(Z  =  {z1,z2})dz1dz2 

(23) 

and  fk+i  (z)  is  the  spatial  likelihood  distribution  function  that 
the  target  generated  the  measurement,  which  is  given  by 

/fc+i(z)  =  j  fk+1(z\x)fk+Mx\Z^dx 

Furthermore,  it  is  reminded  that  g(z)  is  the  spatial  likeli¬ 
hood  distribution  function  that  a  clutter  point  generated  the 
measurement.  As  before,  by  substituting  for  fk+i^ixlZ^) 
from  Eq.  (13)  and  A+i(z|a:)  from  Eq.  (14),  then  applying 
the  identities  of  Eqs.  (15)  and  (16),  it  follows  that  /fc+1(z) 
can  be  written  as 

L 

fk+i(z)  =  WjQ(z\ H  im,  Pu  R) 

1=1 

In  general,  the  integral  equations  of  Eqs.  (2 1 )— (23)  admit  no 
known  closed-form  solutions,  and  so  we  compute  them  via 
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monte  carlo  integration.  Additionally,  it  should  be  noted  that 
the  information  gain  relationship  in  Eq.  (20)  naturally  decom¬ 
poses  into  contributions  from  no  return  which  is  represented  in 
the  first  term,  a  single  return  (either  target  or  clutter  generated) 
through  the  second  and  third  terms,  and  two  returns  in  the 
fourth  and  fifth  terms. 

V.  Results 

Given  the  preceding  results  on  computing  both  the  infor¬ 
mation  divergence  and  the  associated  information  gain,  there 
are  several  ways  in  which  the  methods  may  be  applied.  For 
instance,  Reference  [15]  illustrates  the  information  divergence 
as  a  function  of  time,  illustrating  the  effectiveness  of  measure¬ 
ments  in  a  multi-target  tracking  problem,  and  Reference  [6] 
illustrates  the  information  gain  as  a  function  of  time  as 
a  potential  mechanism  for  determining  the  times  at  which 
measurements  can  be  taken  to  obtain  maximum  information 
gain. 

In  the  sequel,  we  consider  a  fixed  point  in  time  (with  a 
fixed  continuous  state  pdf)  and  use  the  information  gain  as 
a  method  for  determining  the  sensitivities  to  variations  in 
the  prior  probability  of  object  existence,  the  probability  of 
detection,  the  probability  of  false  alarm,  and  the  measurement 
noise.  Furthermore,  to  illustrate  the  flexibility  of  the  developed 
methods,  we  apply  the  information  gain  calculations  to  two 
scenarios:  1)  a  target  which  is  represented  by  a  Gaussian 
distribution  (in  the  continuous  state)  with  a  nearby  object  that 
can  generate  false  alarms,  also  with  a  Gaussian  distribution 
and  2)  a  target  which  is  represent  by  a  Gaussian  mixture 
distribution  (in  the  continuous  state)  with  a  uniform  clutter 
distribution  defined  over  a  portion  of  the  sensor  field  of  view. 

For  both  problems  considered,  the  dynamical  system  model 
is  that  of  a  planar  two-body  orbital  motion  problem  with 
a  sensor  that  is  on  the  surface  of  the  Earth  and  can  take 
measurements  of  the  target’s  position.  That  is,  the  dynamical 
system  is  given  by 


r 

V 

V 

—/irr~3 

where  r  is  the  inertial  position  of  the  object,  v  is  the  inertial 
velocity  of  the  object,  and  //  is  the  gravitational  parameter  of 
the  Earth.  Additionally,  the  measurements  are  taken  to  be  of 
the  form 

2  =  Hx  +  n  , 

where  H  is  such  that  Hx  =  r,  and  n  is  the  measurement 
noise,  which  is  taken  to  be  zero-mean  with  covariance  R  = 

<j2l2x2- 

A  schematic  representing  the  observational  geometry  for  the 
first  scenario  considered  is  given  in  Figure  1.  The  continuous 
state  target  pdf  is  characterized  by  a  Gaussian  distribution 
with  1  [km]  position  uncertainty  and  1  [m/s]  velocity  uncer¬ 
tainty.  Additionally,  the  mean  is  described  by  an  apoapsis  of 
42, 100  [km]  and  an  eccentricity  of  0.2.  The  clutter  model 
in  this  case  is  represented  by  a  nearby  object  that  generates 
false  returns  with  a  pdf  of  g(z)  =  pg(z,mc,  Rc),  where  mc 


is  chosen  to  be  100  [m]  from  the  true  object  in  both  x  and 
y  positions  and  Rc  =  ( <tc)2/2X2  with  ac  =  25  [km].  The 
information  gain  as  a  function  of  the  probability  of  detection 
is  shown  in  Figure  2  for  several  values  of  the  prior  probability 
of  object  existence.  This  shows  that  for  all  values  of  p,  an 
increase  in  pjj  leads  to  higher  information  gain.  Additionally, 
it  is  seen  that  for  high  values  ofpu  a  larger  information  gain 
results  from  smaller  p,  which  is  largely  due  to  the  information 
gained  on  the  probability  of  object  existence.  In  Figure  3,  the 
information  gain  is  shown  as  a  function  of  prior  probability  of 
object  existence  for  several  values  of  the  measurement  noise, 
a.  Here,  it  is  seen  that  lower  measurement  noise  leads  to 
higher  information  gain  across  the  range  of  p.  Additionally, 
an  interesting  inflection  point  is  observed  for  low  values  of 
p.  To  explain  this  effect,  we  show  the  contributions  to  the 
information  gain  in  Figure  4,  which  correspond  to  each  of  the 
terms  in  Eq.  (20).  This  shows  that  for  p  =  0,  the  information 
gain  is  zero,  but  for  small  non-zero  values  of  p,  the  no  return 
and  clutter  return  contributions  to  the  information  gain  are 
high.  As  p  increases,  these  two  contributions  quickly  decrease 
and  the  remaining  contributions  become  dominant.  The  trade¬ 
off  between  the  two  trends  causes  the  inflection  observed  in 
the  information  gain  of  Figure  3. 


Fig.  1.  Schematic  of  the  Gaussian  target/Gaussian  clutter  model.  The  black 
contour  lines  represent  the  Gaussian  target  pdf,  the  gray  contour  lines  represent 
the  Gaussian  clutter  pdf,  and  the  straight  lines  represent  the  sensor  field  of 
view. 

A  schematic  representing  the  observational  geometry  for  the 
second  scenario  considered  is  given  in  Figure  5.  The  continu¬ 
ous  state  target  pdf  is  generated  by  taking  the  continuous  state 
target  pdf  from  the  first  scenario  and  propagating  it  forward  for 
15  hours  using  the  AEGIS  algorithm  of  Reference  [18].  The 
clutter  model  in  this  case  is  represented  by  a  uniform  distri¬ 
bution  within  the  field-of-view  of  the  sensor.  The  information 
gain  as  a  function  of  the  probability  of  detection  is  shown  in 
Figure  6  for  several  values  of  the  prior  probability  of  object 
existence.  In  contrast  to  the  first  scenario,  the  information  gain 
for  different  values  of  p  do  not  intersect.  In  Figure  7,  the 
information  gain  is  shown  as  a  function  of  prior  probability  of 
object  existence  for  several  values  of  the  measurement  noise, 
a.  Here,  it  is  seen  that  lower  measurement  noise  leads  to 
higher  information  gain  across  the  range  of  p.  Similar  to  the 
first  scenario,  an  inflection  point  is  observed  but  with  much 
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Fig.  2.  Information  gain  as  a  function  of  probability  of  detection,  with 
the  measurement  noise  standard  deviation  taken  to  be  a  =  1  [km],  and  the 
probability  of  false  alarm  taken  to  be  pp  =  0.6. 


Fig.  3.  Information  gain  as  a  function  of  prior  probability  of  object  existence, 
with  the  probability  of  detection  taken  to  be  pp  =  0.7,  and  the  probability 
of  false  alarm  taken  to  be  pp  =  0.6. 


Prior  Probability  of  Object  Existence,  p 


Fig.  4.  Contribution  of  the  terms  in  Eq.  (20)  to  the  information  gain  as 
a  function  of  prior  probability  of  object  existence,  with  the  probability  of 
detection  taken  to  be  pp  =0.7,  and  the  probability  of  false  alarm  taken  to 
be  pp  =  0.6. 


less  prominence  in  this  case.  As  before,  the  appearance  of  the 
inflection  is  due  to  the  trade-off  between  the  dominance  of 
no  return  and  clutter  return  information  gain  for  small  p  and 
target  return  and  two  returns  for  large  p. 


Fig.  5.  Schematic  of  the  Gaussian  mixture  target/uniform  clutter  model.  The 
black  contour  lines  represent  the  Gaussian  mixture  target  pdf,  the  solid  gray 
region  represents  the  Gaussian  clutter  pdf,  and  the  straight  lines  represent  the 
sensor  field  of  view. 


VI.  Conclusions 

A  method  for  determining  the  information  gain  in  multi¬ 
target  tracking  problems  has  been  developed  and  applied  for  a 
simplified  0-1  target  tracking  problem.  A  closed-form  solution 
for  the  Cauchy-Schwarz  information  divergence  in  the  0-1 
target  tracking  problem  was  obtained  under  the  assumption 
that  the  continuous  state  pdf  is  represented  by  a  Gaussian 
mixture  distribution  and  that  the  measurement  pdf  is  repre¬ 
sented  by  a  Gaussian  distribution.  The  information  gain  was 
applied  to  two  scenarios  in  space  object  tracking  with  differing 
models  of  the  continuous  state  distribution  and  the  clutter  pdf 
model.  In  all  cases,  it  was  shown  that  lower  measurement 
noise  leads  to  higher  information  gain.  Similarly,  it  was  shown 
that  higher  probability  of  detection  leads  to  higher  information 
gain.  Finally,  it  was  found  that  the  prior  probability  of  object 


0  0.2  0.4  0.6  0.8  1 

Probability  of  Detection,  p p 

Fig.  6.  Information  gain  as  a  function  of  probability  of  detection,  with 
the  measurement  noise  standard  deviation  taken  to  be  cr  =  1  [km],  and  the 
probability  of  false  alarm  taken  to  be  pp  =0.6. 
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Fig.  7.  Information  gain  as  a  function  of  prior  probability  of  object  existence, 
with  the  probability  of  detection  taken  to  be  pp  =  0.7,  and  the  probability 
of  false  alarm  taken  to  be  pp  =  0.6. 


existence  has  the  most  complex  relationship  to  information 
gain.  In  some  cases,  a  lower  prior  probability  coupled  with 
a  high  probability  of  detection  leads  to  significantly  more 
information  gain,  but  this  is  not  always  the  case  as  the 
information  gain  is  highly  situationally  dependent,  i.e.  highly 
dependent  on  the  observation  geometry,  observation  quality, 
and  prior  continuous  state  distribution. 
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Appendix 

For  completeness,  we  summarize  the  0-1  equations  derived 
from  FISST.  The  development  of  the  following  equations  is 
treated  in  Reference  [15].  The  equations  are  presented  for 
each  of  the  possible  measurement  outcomes:  1)  no  return,  2) 
a  single  return,  and  3)  two  returns.  For  each  outcome,  the  first 
two  equations  represent  the  prior  multi-target  density  function 
for  the  cases  that  there  is  no  target  and  that  there  is  a  target, 
respectively.  The  second  two  equations  represent  the  multi¬ 
target  likelihood  function,  again  when  there  is  no  target  and 
when  there  is  a  target,  respectively. 

Case:  Zk+ 1  =  0 

fk+i\k(X  =  0|Z(fe))  =  (l-p) 

fk+ i\k(X  =  {x}\Z{k))  =  pfk+i\k(x\zW) 
fk+i{.Zk+i  =  0|V  =  0)  =  (1  —  pf) 
fk+i(Zk+i  =  0|V  =  {a?})  =  (1  —  pF)(l  —  Pd) 

Case:  Zk+1  =  {z} 

fk+1\k(X  =  H>\Z^)  =  (l-p) 
fk+i\k(X  =  {x}\Z™)  =pfk+1[k(x\zW) 
fk+i(Zk+i  =  {z}\X  =  0)  =  pFg{z) 
fk+1(Zk+1  =  {z}\X  =  {®})  =pF(l  -pD)g(z) 

+  Pd(  1  -  PF)fk+l(z\x) 

Case:  Zk+1  =  {z1,z2} 

fk+i\k(X  =  0|Z(fe))  =  (1  -p) 
fk+i\k(X  =  {x}\Z^)  =  Pfk+Mk(x\zW) 
fk+1(Zk+1={Zl,z2}  |X  =  0)  =  O 
fk+1(Zk+1  =  {zi,  z2}\X  =  {a;})  = 

PfPd  (g(zi)fk+i(z2\x)  +  g(z2) f k+i(zi\x)) 

In  addition  to  the  prior  density  and  likelihood  relationships,  the 
multi-target  Bayes  factors  are  given  for  the  no  return,  single 
return,  and  two  return  measurement  outcomes,  respectively,  by 

f{Zk+ 1  =  0)  =  (1  ~Pf)(  1  ~PPd) 
f(Zk+i  =  {z})  =pF{  1  -  ppn)g(z) 

+  ppD{l-pF)fk+i{z) 
f(Zk+l  =  {Z1,Z2})  =  ppFPD[g(z1)fk+l{z2) 

+  g(z2)fk+i(zi)] 


1133 


